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1 Introduction 



Hidden Markov Models (HMMs) form a wide class of discrete-time stochastic processe s, used in 



(198 21)). biology 



Chan and Ledolter 



differe nt areas suc h as speech recog nition (|Juang and Rabinerl (119911)). neurophysiology (Frcdkin and Rice 



Churchill (1989)) . and time series analysis_ (|De Jong and Shepharc 

J); 



(1995 ; sec 



MacDonald and Zucchini! (|1997l ) and the references therein). 



The motivation for finding conditions implying the non singularity of the limiting Fisher 
information matrix is intrinsequely linked with the asymptotic properties of the maximum like- 
lihood estimator (MLE) in those models. Since the way of dealing with consistency and asym- 
potic normality of the MLE is not yet unified for hidden Markov models, we first give a quick 
overview of the different types of techniques that were developped aroud the MLE properties. 

Most works on maximum likelihood estimation in such models have focused on iterative nu- 
merical methods, suitable for approximating the maximum likelihood estimator. By contrast, 
the statistical issues regarding the asymptotic prop erties of the maximum likelihood estima- 



Baum and Petrid ((1966) have shown the 



tor itself have been largely ignored until recently, 
consistency and asymptotic normality of the maximum likelihood estimator in the particular 
case where both the observed and the latent variables ta ke only fi nitely many values. Thes e 



results have been ex t ended rece ntly in serie s of pa pers by 



Jensen and Petersen! (119991) and 



Douc 



et al 



Leronxl (11992 ) 



Bicke l et al 



(1998), 



(2004). The latter authors generalize the method 



followed bv I Jensen and Petersen! (|l999l ) to swiching autoregressive models associated to a pos- 
sibly non finite hidden state space, the observations belonging to a general topological space. 
Their method put the consistency and the asymptotic normality in a common framework where 
some "stationary approximation" is performed under uniform ergodicity of the hidden Markov 
chain. This stringent assumption seems hard to check in a non compact sta te space. Never- 



theless, up to our best knowledge, the assumptions used in 



Douc et al 



(2004) are the weakest 



known in hidden Markov models literature for proving these asymptotic results, even if the 
extension to a non compact state space is still an open questi on. 



Another approach was initiated by 



Le Gland and Mevell (|2000l ). They independently de- 



veloped a different technique to prove the consistency and the asymptotic normality of the 
MLE (Mevel 1997) for hid den Markov models with finite hidden state sp ace. The work of 



Le Gland and Mevell ( 200C ) later generalised to a non finite state space by 



Douc and Mafia; 



is based on the remark that the likelihood can be expressed as an additive function of an 



2 



"extended Markov chain". They show that under appropriate conditions, this extended chain 
is in some sense geometrically ergodic once again under the assumption of uniform ergodicity 
for the hidden Markov chain. Nevertheless, even if the ergodicity of the extended Markov chain 
may be of i ndepe ndent interest, the assumptions used in this approach are stronger than in 



Douc et al 



(200J) 



However, in all these papers, the asymptotic normality of the MLE is derived from the consis- 
tency property thanks to the non singularity of the limiting Fisher information matrix. Indeed, 
whatever approach is considered ("extended Markov chain" method or "stationary approxima- 
tion" approach) , the asymptotic normality of the MLE is obtained through a Taylor expansion 
of the gradient of the loglikelihood around the true value of the parameter. The given equation 
is then transformed by inversion of the Fisher information matrix associated to N observations 
so as to isolate the quantity of interest vN(9mv ~ $*) (where Omv is the maximum likelihood 
estimator and 9* the true value). The last step consists in considering the asymptotic behav- 
ior of the obtained equation as the number of observations grows to infinity and in particular, 
a crucial feature is that the normalised information matrix should converge to a non singular 
matrix. 

Unfortunately, the asymptotic Fisher information matrix 1(9) = — E# (Vg logpe(l / i|Y^. tX) )j 
is the expectation of a quantity which does depend on all the previous observations. Under this 
form, the non singularity of this matrix is hardly readable. The aim of this paper is to show that 
this non singularity is equivalent to the non singularity of some Iy 1;n (^) = — IE# (V# logpe(Y"i :n )) 
where here, the expectation only concerns a finite number of observations. One thus might 
expect that non singularity of lY 1:n (9) is easier to check than the one of 1(9). This is a simple 
result but we expect that it helps for checking such intractable non singularity assumption. The 
rest of the paper is organised as follows: we introduce the model and the assumptions in Section 
2. In Section 3, after recalling some properties of the MLE, we state and prove the main result 
of the paper using a technical proposition. Finally, Section 4 is devoted to the proof of this 
technical proposition. 



2 Model and assumptions 



In the following, the assumptions on the model and t he description of t 



concerning the MLE directly derive from the paper of 



Douc et al 



le asymptotic results 



<2QQ4|) 



Let {X n }™ =0 be a 
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Markov Chain on (X, B(X)). We denote by {Qg(x, A),x £ X,^4 € S(X)}, the Markov transition 
kernel of the chain. We also let {l^l^o be a sequence of random variables in (Y,£>(Y)), 
such that, conditional on {X n }^ =0 , {Y n }^ =0 is a sequence of conditionally independent random 
variables, Y n with conditional density ge{y\X n ) with respect to some a- finite measure v on the 
Borel cr-field B(Y). Usually, X and Y are subsets of R s and R* respectively, but they may also 
be higher dimensional spaces. Moreover, both Qg and gg depend on a parameter 9 in 0, where 
© is a compact subset of W . The true parameter value will be denoted 6* and is assumed to be 
in the interior of 0. 

Assume that for any x G X, Qg(x,-) has a density qg(x,-) with respect to the same o- 
finite dominating measure [i on X. For any k > 1, the density of Qg(x, •) with respect to /i 
is denoted by qjg(x, •). In the following, for m > n, denote Y™ the family of random variables 
(Y n , . . . , Y m ). Moreover, for any measurable function / on (X, £>(X),^i), denote ess sup(/) = 
inf{M > 0,n({M < |/|}) = 0} and if / is non-negative, ess inf(/) = sup{M > 0,/z({M > /}) = 
0} (with obvious conventions if those sets are empty). By convention, we simply write sup (resp. 
inf) instead of ess sup (resp. ess inf). 

We denote by irg the stationary distribution of the kernel Qq when it exists. Let (Z n = 
(X n ,Y n )) n > be the Markov chain on ((X x Y) N , (B(X) ® B(Y)) m ) of transition kernel P e 
defined by 

P e (z,A) := J J l A (x',y')Qe(x,dx')g e (y'\x')u(dy'). 

Pe,x (resp. E^.a) denotes the probability (resp. expectation) induced by the Markov chain 
(Z n ) n >Q of transition kernel Pg and initial distribution \{dx)gg{y\x)v(dy). By convention, we 
simply write Pg jX := Fg t s x and Pg := Pg^ and K$ )X and Kg will be the associated expectation. 
Moreover, irg will also denote the density of the stationary distribution with respect to \i. 



In this paper, || • || denotes the L2 norm in R p , i.e. for any ip E R p , \\<p\\ = \J ip T ip. By abuse 
of notation, || • || will also denote the associated L 2 -norm in the space of symmetric matrices in 
MP x R p , i.e. for any real pxp matrix J, ||«7|| := sup^, iui|=i l^Jtfl- For all bounded measurable 
function /, define ||/||oo := su Pxgx C wm denote unspecified finite constant which may 

take different values upon each appearance. In the following, we will use pg as a generic symbol 
for density. When this density explicitely depends on irg, we stress it by writing pg instead of 
pg. In case of ambiguity, we will define precisely the density pg. 
Consistency assumptions. I recall the assumptions, used in 



Douc et al 



(2004) to obtain 



consistency for a switching autoregressive model. Since we consider here a hidden Markov 



model, we adapt the statement of their assumptions. 

(Al) (a) < a_ := inf eeQ inf x ,x'eX Qe(x, x') and o+ := sup ege sup^gx qe(x, x') < oo. 
(b) For all y G Y, < mi g£Q J^ge(y\x) fi(dx) and sup^e j^ge{y\x) fi(dx) < oo. 

(A2) b + := supgsupy x ge{y\x) < oo and Efl.(|log &-(li)|) < oo, where b_(y) := inf e J^g g (y\x) fi(d. 

(A3) 9 = 9* if and only if pj = pjl, where pj' is the trace of F e on {Y N , B(Y) m }, that is the 
distribution of {lit}. 

Asymptotic normality assumptions. Some additional assumptions are needed for the 
asymptotic normality of the MLE. We will assume that there exists a positive real 8 such 
that on G := {9 G : \\9 — 9*\\ < 5}, the following conditions hold. 

(A4) For all x,x' G X and y' G Y, the functions 9 i— ► qg(x,x') and # i— ► fife(y'|s') are twice 
continuously differentiate on G. 

(A5) (a) swp eeG swp x>x , \\V e logqe(x,x')\\ < oo, swp 0eG sup^, \\V%logq e (x,x')\\ < oo. 

(b) Eg* [sup eeG sup^. ||V e log5r e (Yi|x)|| 2 ] < oo, E e * [sup eeG sup,,, || V| log flf© (11 |ar) ||] < oo. 

(A6) (a) For i/-almost all y in Y there exists a function f y : X — ► M + in L (fj,) such that 
sup eeG 5e(?/k) < A/(af)- 
(b) For //-almost all x G X, there exist functions f x : Y — > M + and / ^ : Y —>■ M + in 
^(i/) such that ||V 9 i»(y|aO|| < fl{y) and ||V|^(y|a;)|| < fl{y) for all 9 G G. 

3 Main result 

We first re c all th e results of consistency and asymptotic normality of the MLE obtained by 



x 



Douc et al 



(2004). Write 

^n,xo the maximum likelihood estimator associated to the initial 



condition Xq 

^ where S X q is the dirac mass centered in xq. 
Theorem 1. Assume (AlT|) (Al3|). Then, for any xq G X, 



lim 9 nxo = 9* F e * - a.s. 

n— >oo 
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Define the stationary density pg and the Fisher information matrix Iy%(&) associated to n 
observation by: 

n—1 n 

■■■ 7r e (dx 1 )Y[q e (x i ,x i+1 )Y[gg(Y i \x i )^ n (dx^ 
■> i=i i=i 

I Y «(0) :=-E e (V 2 logp e (yi))- 



Done et al 



(2004) (Section 6.2) proved that lim^oo — E#(Vg logp e (Y™))/n exists and denoting 
by 1(9) this limit, they obtained that the asymptotic Fisher information matrix 1(9) may also 
be written as 

1(9) = lim -E e (V 2 6 \ogp e (Y?))/n= lim %(-V 2 e logp,(Yi|Y^J). (1) 

n^oo m— >oo 

We recall the asymptotic normality of the MLE obtained by 



Done et al 



(2004). 



Theorem 2. Assume (A lT|) -(AlS |) . Then, provided that 1(9*) is non singular, we have for any 
x G X, 

\fa(pnw -0*)^ M(0, 1(9*y l ) ¥g* -weakly. 

We may now state the main result of the paper which links the asymptotic Fisher information 
matrix with the stationary information matrix associated to a finite number of observations. 

Theorem 3. Assume (_A lT|) (.A lS|) . Then, 1(9*) is non singular if and only if there exists n > 1, 
such that Iy™(6*) is non singular. 

Note that this theorem holds under the same assumptions as in Theorem [2j Of course, 
since the true parameter is not known, the non singularity of Iy^(&) should be checked for all 
9. More precisely, if the stationary distribution ttq of the hidden Markov chain is sufficiently 
known so that for any parameter 9, the Fisher information matrix Iy™(#) is shown to be non 
singular (for some n) , then the sufficient condition of Theorem ensures that the asymptotic 
Fisher information matrix is non singular. Thus, the MLE is asymptotically normal by applying 
Theorem |2 Before proving the necessary and sufficient condition of Theorem |3J we need a 
technical proposition about some asymptotics of the Fisher information matrix. Let 

W-* (9) := -l^(Vilogp,.(Y?|Y:*_J). 

1 I —k — m 

Proposition 1. Assume fA ll|) -(.A l6|) . Then, for all n > 1, 

lim sup ||/ Y n| Y -* (#*) " Iy?(0*)\\ = 0. 
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While this result seems intuitive since the ergodicity of (X^) implies the asymptotic inde- 
pendence of (Y™) (where n is fixed) wrt a{Y~i', I > k}, the rigourous proof of this proposition 
is rather technical and is thus postponed to Section 4. Using this proposition, Theorem |3] may 
now be proved with elementary arguments. 

Proof. (Theorem EJ) By Eq. Q, 

detl(0*) = det lim lV = lim det ' 1 



n n I n \ n 



And thus if for all n, Jyj(#*) is singular then 1(9*) is singular. Now, assume that 1(6*) is 
singular. Fix some n > 1 and let k > n. By stationarity of the sequence (Yi)i& under P# and 
elementary properties of the Fisher information matrix, 



kl(9*) = k lim E^-V^logzvOilY^J), 

k 

= lim ^^(-V^log^^lY^+^i)), 
i=l 

= lim E e *(-Vilog^(Y^|Y° m )), 

m— >oo 

= Jrn^ [E e .(-Vglogp e .(Yt n+ i|Y° m )) +I e .(-V^ logp e .(Y*-"|Yt n+ i, Y°J) 
> limsupE,,(-V^logp ,(Yt n+ i|Y° m )) = h mS upI «(-V^ logJ^(Y?|YZ*+£_J). 

m— >oo m— »oo 

Let y> £ M p such that I(6*)ip = 0. Then, by the above inequality, for all k > n, 

¥J T lim B upl fl .(-Vilog^(Y3 l |YzjK- TO ))^ = 0. 



Now, letting A; — > oo, and using Proposition Q we get Iy™(#*) ( / 3 = 0- Thus, Iy^ is singular for 
any n > 1. The proof is completed. 

□ 

4 Proof of Proposition [T] 

Regularity of the stationary distribution. We first check that i— > i^e(f) is twice differ- 
entiable et obtain a closed form expression of VgirQ(f). By f.AlT|). the Markov chain {X^^q is 
uniformly ergodic and thus the Poisson equation associated to: 

V(x) -Q e *(x,V) = f(x) -7r e .(/) where 7r e .(|/|) < oo, 
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admits a unique solution that we denote by Vg*. Classically, since for all (x,A) in (X,S(X)), 
Qq*(x,A) > a-fi(A), we have for all bounded measurable /, 

\Q k e (x,f)-Mf)\ <2||/|| 00 (l-a_) fe , 



and then 



VxGX, \Vg*(x)\ 



k=0 



< 2 



<7- 



(2) 



Note that for all 6 G and for all bounded measurable function /, 



Mf)-K d *(f) = J n (dx){f(x)-ir *(f)) = I ir (dx)(V *(x)-Q *{x,V *), 

= *e{Qe-Qe*)V e *, (3) 

where we have used that irg is the invariant probability measure for the transition kernel Qg. 
Eq. © implies under(A|U)-(AEJ) that 



V07T0*(/) 



7r fl . (cfe)V^* (x, x')V&. (x')n(dx') = Eg* {Vg log q * {X^X^Vg* (X 1 )} . 



C ombining the uniform ergodicit y (<r_ = infg infj. x i q (x, x') > 0) with (AlH ), the conditions 1-3 



of 



Hei dergott and Hord iik ( 2003J) are trivially satisfied and Theorem 4 of 



Heidergott and Hordiik 



(2003) ensures that 6 \— > TTg(f) is twice differentiable. Moreover, applying Eq. Q to the bounded 
function /(•) = qg* (-, x) where x is a fixed point in X yields that 6 t— > n (x) is twice differentiable 
at 6 = 6* and 



r oo 

VgTTg* (x) = Eg* I Vg log q e . (X , X 1 ) ]T [q^ (X 1 , X) - TTg* (x) 



(4) 



fc=0 



Technical bounds. We will now state and prove some t echnical bounds that will be useful for 



the proof of Proposition^ Lemma 9 in lDouc et al 



(2004) ensures the uniform forgetting of the 



initial distribution for the reverse a posteriori chain. It implies that for all m < n, < k < n—m, 



WL G B(X), \Pg(X n _ k e A\Y^,X n = x)- Pg(X n _ k G A\Y^,X n = x')\ < p k , (5) 
where p := 1 — o"_/<t + . 

Lemma 1. Assume (A^)-(AlHJ), then we have the following inequalities, 
(i) \\JVgir *(x)f(x)fM(dx)\\ < C]|/]|oo w^C:=2 sup ** H^^^Hlh , 



S 



(ii) For all k > 0, there exists a random variable D(Y_^ ) satisfying Eg*(D(Y_ 



< oo 



such that for all m > 0, 



J VePe* (x-k+i \Y^_ m )f(x- k )n(dx- k+ - 
Moreover, for all k > 0, %* (D(YZ^) 2 ) = E fl . (L>(Y° oc ) 2 ) 



Proof. Combining Eq © and Eq @ yields the first inequality. We will prove the second 
inequality with k = 0. It is actually sufficient to bound sup xi |V# logp e *(xi|YZ m )|. This can be 
done by using the Fisher identity: 



V e log^(X 1 |YZ m )=E e ,(V e log^(XZ m ,YZ m )|X 1 ,YZ„J-E e *(V,log^(XL m ,YZ m )|Y 



l v o 



J E e * jv e log7T0.(X_ m )+ [^e^ogqe^X^Xi+^ + Velogge^YilXi)] 

V. i=—m 



Xo, Y_„ 



.(dXol^i.Y^ 



.(dXo|YZj]+2[||V e logg e »(v)||oo + ||V e log^(Yo|Ollc 



Thus, using Eq. ©, we get 
supllVelogJvOcilYOjH 



< llVelogTr^lloop^H- ]T p |i| (l + li=o)[p _1 ||V logg^(-,-)|| oo + ||V e log^(y i |-)|U], 

i=— oo 



< ||V e log7r^||oc+ Yl (l + l^)P l<l [P~ 1 l|V«logg^(^0lloo + ||V«log^.(y i |0||oo] = i5(Y^ M ; 

i=— oo 

Using CAJ5|). it is straightforward that E#* (/^(YZoo) 2 ) < oo. Morever, by stationarity of (1^) 
under P e «, E r (D(YZ^) 2 ) = E e » (Z^Y^) 2 ). The proof is completed. □ 

Define P e(Y^\X. k = x):=J--J q k e (x, x ) U7=i <Z*(*i-i, ^(Y^/x^ 1 )^). 

Lemma 2. Assume f -A lTj) -( and /ix some n > 1. Then, 

(i) For all k > 0, 

sup,.,, |p e *(Y?|X_ fe = x) -p e *(Y«|X_ fc = x')| „ n „ Nfe 



inf,p e ,(Yf|X = x) " 2(1 (T " ) U-J' 

There exists 1 > A > 1 — <r_ and a random variable EiYf) satisfying E(£'(Y™) 2 ) < oo 



such that for all k > 0, 

sup,,,, \\V ePe *(Y?\X_ k = x)- V fl p».(Y?|X_ fc = x')|| , , . ( , 

< A £,[Y X J. 



inf xW ,(Y^|X = x) 
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Proof. Define fo(x) := pq(Y™\Xq = x). First, note that 

suPx fe* (g) < suPxf Pe*( Y il^i = %i)Qe* (x, dxi) a+ 
M x fg*(x) ~ ini x f p e *(Y^\X 1 = x 1 )Qg*(x,dx 1 ) ~ cr_ ' 



(6) 



For all x, x 1 in X, 

\p e *(Yf\X_ k = x)- Pe *(Y™\X_ k = x') 
<2(l-a_) fc sup/ e .(x), 



J fe*(x){Q k e *{x,dxQ)-Q k e »{x',d X Q)) 



where we have used the uniform ergodicity of the chain (X n ). Combining with completes 
the proof of (i). Now, note that pg(Y™\X_ k = x) = Q@(x, fg) and write 

Vg Pe *(Y r 1 t \X_ k = x) = V 8 \Q k {x, fg)} 

J 0=0* 

= E e *, x (V e f e *(X k )) + E e *, x U *(X k ) ^ 9 ]ogJ[q 9 .(X i . 1 ,X i )\ \ , 

= E e *, x (V e f e * (X k )) + E 9 *, x | (f * (X k ) - ir e * (fe*)) (v log J[ qe* (JT<-i, X;)J | 

fe 

i=l 

Using that the chain (Xj) is uniformly ergodic yields 
||V^*(Y?|X_ fe = x) - V eP0 *(Y?\X_ k = x')\\ 

k 

<2sup||V,/ e .(x)||(l- C T_) fe + 4sup/ e .(x)^(l-a_) fe - l ||V,log^(.,-)||oo(l-cT_r 1 . 

X X 

Moreover, using the Fisher identity: 



i=l 



supV e /6i»(x) < sup/6»»(x)supVelogpe.(Y™|X = »), 

X XX 

= sup/ e *(x)su P E r (V e log Pe *(X?, Y"|X = x)\ Y™, X = x) , 

a; a; 

<SU P / e »(x) ^[||V (9 logg^(-,.)Hoa + ||Vfllog W .(y i |.)||oa]j • 
Combining with © completes the proof. 

As a consequence of Lemma |2j we have 
Lemma 3. Assume ( A|T} ( AEJ) . TTien, 

(%) There exists a random variable F(Y™) satisfying Eg* (F(Y™) 2 ) < oo and 

< ^(Y"). 



(7) 
□ 



l|v*p fl .(Yr)|| 



inf x .p e *(Y^|A"o = x) 
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(ii) There exists a constant C such that for all m, k > 0, 



\\V e p e *(Y?\Y_ 



-k ) 

-k—m' 



^<C(£(Y?) + D(YI^))A* 



mf xPe *(Y?\X = x) 
Proof. As in the proof of Lemma[21 define fg(x) := pe(Y™|Ao = x). Then, 



l|V*p*.(Y?)|| < 



Vefe*(x)ne*(dx) 



+ 



fg*(x)S7girg*(X = x)n(dx) 



< sup||V e / fl .(x)||+(7sup/ (9 .(x). 

X X 

by Lemma ^(i). Combining with © and (JJJ) completes the proof of (i). Now, write: 



||V fl p*.(Y?|YlJ_ J - V e ^(Y™)|| < 



V e pe*mx„ k ){n*{dX_ k \Y-_ k k 



k—m' 



+ 



J p e *(Y^\X_ k = x)(V e p e *(X_ k = x\YZ k k _J ~ V e vr e ,(X_ fc = x))fx(da 



The first term of the rhs is bounded using Lemma |2(ii) 



VeP8*mX_ k )(F e *(dX_ k \YZ k k „J - 7r e «(dX_ k )) 



<\ k E(Y?)mip e *(Y?\X = x). 



For the second term, fix some u in X. Then, using Lemma El(i), 

J p e *(Y?\X_ h = x){VeP8*{X_ k = x\YZ k k _J - Ve^{X- k = x))fi(dx) 
J ( Pe *mx_ k = x) - p e *{Y^\X_ k = u))(V pe*(X_ k = x\Yz k k _ m 



V g Tr e *(X_ k = x))fj,(dx) 



<2(l-a^(D(YZ k 00 ) + C) 



<J— I x 



infp e .(Y?|X = x), 



which completes the proof of (ii). 



Proof of Proposition ^ 



Proof. First, write 



□ 



II A? ( 



sup 

w,||u||=l 



e * {(u T V e logp e ,(Y?)) 2 - (u T V e logp e *mYztJ) 2 } 
< E e . {||V e log^(Y") - V fl log^.(Y?|YZ*_ JIHIVfllogp^CYT) + V fl logp fl .(Yf |YlJ_J||} , 



< |E,^||V e log^(Y r )-V,log^(Yi|Y:tJ|| 2 }) 1/2 



E * \ \\V logp e *(Y?) + Vglog] 



'Yi'lYltJII 2 }) 



1/2 



11 



It is thus sufficient to prove that 

1 /2 

lim sup(l e *||V log;MY-|Y:£_J- V e logp e *(Y™)|| 2 ) =0, (8) 
Um S upsup(l fl .||V fl logp e .(Yr|YZ*_ m ) + V e logp e *(Y?)|| 2 ) 1/2 < oo. (9) 
We will just prove the first inequality since © is directly implied by ([5]). Now, 



||V e log;Mn|Y:tj - V e logp e *(Y?)|| 

< ||V e p e *(Y?|Yztj-V e p>(Yf)|| ||V fl p >(Y?)|||^(Y?|YltJ-p e .(Y?)| 

p>(Y"|Ylt m ) P>(Y«) p^(Y ? |Yzt m ) 

< ||V^(Y^|Yzt m )-V^(Yf)|| 

inf 3; ^(Y™|X = x) 
||V p>(Y?)|| SU P^' |Po»(Y?|X_ fc = s) -p fl *(Yf [X_ fc = x')| 
p>(Y?) inf x p e ,(Y?|X = x) 

which implies (|5)l. using Lemma |2(i) and Lemma and (ii). The proof is completed. □ 
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